\section{Lower bound for deterministic online token-forwarding algorithms}
\label{sec:lower}
In this section, we give an $\Omega(kn/\log n)$ lower bound on the
number of rounds needed by any deterministic online token-forwarding
algorithm for the $k$-gossip problem.  Our lower bound applies to even
centralized algorithms and a large class of initial token
distributions.  We first describe the adversary strategy.

\smallskip
\noindent 
{\em Adversary:} The strategy of the adversary is simple.  We use the
notion of {\em free edge}\/ introduced in~\cite{kuhn+lo:dynamic}.  In
a given round $r$, we call an edge $(u,v)$ to be a free edge if at the
start of round $r$, $u$ has the token that $v$ broadcasts in the round
and $v$ has the token that $u$ broadcasts in the round\footnote{For
  convenience, when a node does not broadcast any token we will view
  it as broadcasting a special {\em empty}\/ token that every node
  has.  This allows us to avoid treating the empty broadcast as a
  special case.}; an edge that is not free is called {\em non-free}.
Thus, if $(u,v)$ is a free edge in a particular round, neither $u$ nor
$v$ can gain any new token through this edge in the round.  Since we
are considering deterministic token-forwarding algorithms, at the
start of each round, the adversary knows for each node $v$, the token
(if any) that $v$ will broadcast in that round.  In round $r$, the
adversary constructs the communication graph $G_r$ as follows.  First,
the adversary adds all the free edges to $G_r$.  Let
$C_1,C_2,\dots,C_l$ denote the connected components thus formed. The
adversary then guarantees the connectivity of the graph by selecting
an arbitrary node in each connected component and connecting them in a
line.  Figure \ref{fig:adversary} illustrates the construction.

The network $G_r$ thus constructed has exactly $l - 1$ non-free edges,
where $l$ is the number of connected components formed by the free
edges of $G_r$.  If $(u,v)$ is a non-free edge in $G_r$, then $u$,
$v$, or both will gain at most new token through this edge. We refer
to such a token exchange on a non-free edge as a {\em useful token
  exchange}.

We bound the running-time of any token-forwarding algorithm by
identifying a critical structure that quantifies the progress made in
each round.  We say that a sequence of nodes $v_1,v_2,\ldots,v_k$ is
{\em half-empty}\/ in round $r$ with respect to a sequence of tokens
$t_1,t_2,\ldots,t_k$ if the following condition holds at the start of
round $r$: for all $1 \le i,j \le k$, $i \neq j$, either $v_i$ is
missing $t_j$ or $v_j$ is missing $t_i$.  We then say that $\langle
v_i \rangle$ is half-empty with respect to $\langle t_i \rangle$ and
refer to the pair $(\langle v_i \rangle, \langle t_i \rangle)$ as a
half-empty configuration of size $k$.

\begin{figure}[ht]
\begin{center}
\includegraphics[width=5in]{./figures/adversary.jpg}
\caption{The network constructed by the adversary in a particular
  round.  Note that if node $v_i$ broadcasts token $t_i$, then the
  $\langle v_i \rangle$ forms a half-empty configuration with respect
  to $\langle t_i \rangle$ at the start of this round.}
\label{fig:adversary}
\end{center}
\end{figure}

\junk{To prove the lower bound, we consider a special starting state, where
each node $u$ has token $t_i$ with probability $3/4$ for all $i$. Then
we look for certain structure in this starting state, and argue that
with such structure the number of useful token exchanges is $O(\log
n)$ with high probability. Furthermore, we argue that such structure
will always exist in the following rounds of communication. That
means, the number of useful token exchanges will be $O(\log n)$ in
every round. Since each node $u$ has any token with probability $3/4$
at the starting point, the expected number of missing token is $n\cdot
k/4 = \Omega(kn)$. In fact, we argue the number of missing token is
$\Omega(kn)$ with high probability using Chernoff bound. Thus, this
will imply any centralized deterministic algorithm runs in
$\Omega(kn/\log n)$ rounds.

First, we draw the connection between the number of useful token
exchanges and the existence of certain structure in Lemma
\ref{lem:free+edge}. Then in Lemma \ref{lem:alg+lower} we show the
$\Omega(kn/\log n)$ lower bound if the token dissemination process
starts from the state where each node $u$ has token $t_i$ with
probability $3/4$ for all $i$. Lastly, in Theorem
\ref{thm:alg+lower.single} we prove our lower bound.
\begin{lemma}
\label{lem:free+edge}
If $m$ or more useful token exchanges occur, then there exist $m/2+1$
nodes $v_1,v_2,\dots,v_{m/2+1}$ and $m/2+1$ tokens
$t_1,t_2,\dots,t_{m/2+1}$ ($v_i$ broadcasts token $t_i$) such that,
for all $i<j$, either $v_i$ is missing $t_j$ or $v_j$ is missing
$t_i$.
\end{lemma}
}

\begin{lemma}
\label{lem:free+edge}
If $m$ useful token exchanges occur in round $r$, then there exists a
half-empty configuration of size at least $m/2 + 1$ at the start of
round $r$.
\end{lemma}
\begin{proof}
Consider the network $G_r$ in round $r$.  Each non-free edge can
contribute at most 2 useful token exchanges. Thus, there are at least
$m/2$ non-free edges in the communication graph.  Based on the
adversary we consider, no useful token exchange takes place within the
connected components induced by the free edges. Useful token exchanges
can only happen over the non-free edges between connected
components. This implies there are at least $m/2+1$ connected
components in the subgraph of $G_r$ induced by the free edges.  Let
$v_i$ denote an arbitrary node in the $i$th connected component in
this subgraph, and let $t_i$ be the token broadcast by $v_i$ in round
$r$.  For $i \neq j$, since $v_i$ and $v_j$ are in different connected
components, $(v_i,v_j)$ is a non-free edge in round $r$; hence, at the
start of round $r$, either $v_i$ is missing $t_j$ or $v_j$ is missing
$t_i$.  Thus, the sequence $\langle v_i \rangle$ of nodes of size at
least $m/2 + 1$ is half-empty with respect to the sequence $\langle
t_i \rangle$ at the start of round $r$.
\end{proof}
An important point to note about the definition of a semi-anti-clique
is that it only depends on the token distribution; it is independent
of the broadcast in any round.  This allows us to prove the following
easy lemma.
\begin{lemma}
\label{lem:monotone}
If a sequence $\langle v_i \rangle$ of nodes is half-empty with
respect to $\langle t_i \rangle$ at the start of round $r$, then
$\langle v_i \rangle$ is half-empty with respect to $\langle t_i
\rangle$ at the start of round $r'$ for any $r' \le r$.
\end{lemma}
\begin{proof}
The lemma follows immediately from the fact that if a node $v_i$ is
missing a token $t_j$ at the start of round $r$, then $v_i$ is missing
token $t_j$ at the start of every round $r' < r$.
\end{proof}
Lemmas~\ref{lem:free+edge} and~\ref{lem:monotone} suggest that if we
can identify a token distribution in which all half-empty
configuration are small, we can guarantee small progress in each
round.  We now show that there are many token distributions with this
property, thus yielding the desired lower bound.
\begin{theorem}
\label{thm:alg+lower}
From an initial token distribution in which each node has each token
independently with probability $3/4$, any deterministic online
token-forwarding algorithm will need $\Omega(kn/\log n)$ rounds to
complete with high probability.
\end{theorem}
\begin{proof}
We first note that if the number of tokens $k$ is less than $5\log n$,
then the number of rounds needed is at least $\Omega(n)$ because even
to disseminate one token will take $\Omega(n)$ rounds in the
worst-case.  The $\Omega(kn/\log n)$ lower bound is trivially true in
this case. Thus, in the following proof, we focus on the case where
$k\ge 5\log n$.

Let $E_l$ denote the event that there exists a half-empty
configuration of size $l$ at the start of the first round.  For $E_l$
to hold, we need $l$ nodes $v_1,v_2,\dots,v_l$ and $l$ tokens
$t_1,t_2,\dots,t_l$ such that for all $i\neq j$ either $v_i$ is
missing $t_j$ or $v_j$ is missing $t_i$. For a pair of nodes $u$ and
$v$, by union bound, the probability that $u$ is missing $t_v$ or $v$
is missing $t_u$ is at most $1/4+1/4 = 1/2$. Thus, the probability of
$E_l$ can be bounded as follows.
\[ \prob{E_l} \le {n \choose l}\cdot \frac{k!}{(k-l)!}\cdot \rb{\frac{1}{2}}^{l \choose 2} \le n^l \cdot k^l \frac{1}{2^{l(l-1)/2}} \le \frac{2^{2l\log n}}{2^{l(l-1)/2}} \]
When $l\ge 5\log n$, $\prob{E_l} \le 1/n^2$.  Thus, the largest
semi-anti-clique at the start of the first round is of size at most $5
\log n$.  It follows that the largest half-empty configuration at the
start of any round is of size at most $5 \log n$.  By
Lemma~\ref{lem:free+edge}, we thus obtain that the number of useful
token exchanges in each round is at most $10 \log n$, with probability
at least $1 - 1/n^2$.

\junk{Now we argue, in each following rounds, the number of useful token
exchanges is also no more than $2(l-1)$ with high probability, where $l\ge
5\log n$. If there are $2(l-1)$ or more useful token exchanges in
round $r$ where $r>1$, then by Lemma \ref{lem:free+edge} there exist
$l$ nodes $v_1,v_2,\dots,v_l$ and $l$ tokens $t_1,t_2,\dots,t_l$ such
that for all $i \neq j$ either $v_i$ is missing $t_j$ or $v_j$ is
missing $t_i$. Then at the beginning of round 1, the condition that
for all $i \neq j$ either $v_i$ is missing $t_j$ or $v_j$ is missing
$t_i$ still holds, and this can happen only with probability $1/n^2$.}

Let $M_i$ be the number of tokens that node $i$ is missing in the
initial distribution. Then $M_i$ is a binomial random variable with
$\expect{M_i}=3k/4$.  By a straightforward Chernoff bound, we have the
probability that node $i$ misses less than $k/8$ tokens is
\[\prob{M_i\ge \frac{3k}{4}+\frac{k}{8}} = \prob{M_i\ge \rb{1+\frac{1}{6}}\cdot \expect{M_i}} \le e^{-\expect{M_i}\rb{\frac{1}{6}}^2 \frac{1}{2}} = e^{-\frac{3k}{4}\rb{\frac{1}{6}}^2\frac{1}{2}} = \frac{1}{e^{96k}}\] Therefore, the total number of tokens missing in the initial
distribution is at least $n\cdot k/8 = \Omega(kn)$ with probability at
least $1-n/e^{96k}\ge 1-1/n^2$ ($k\ge 5\log n$).  Since the number of
useful tokens exchanged in each round is at most $10 \log n$, the
number of rounds needed to complete $k$-gossip is $\Omega(kn /\log n)$
with high probability.
\end{proof}
Theorem~\ref{thm:alg+lower} does not apply to certain natural initial
distributions, such as one in which each token resides at exactly one
node.  While this class of token distributions has far fewer tokens
distributed initially, the argument of Theorem~\ref{thm:alg+lower}
does not rule out the possibility that an algorithm, when starting
from a distribution in this class, avoids the problematic
configurations that arise in the proof.  The following theorem extends
the lower bound to this class of distributions.
\begin{theorem}
\label{thm:alg+lower.single}
From any distribution in which each token starts at exactly one node,
any deterministic online token-forwarding algorithm for $k$-gossip
needs $\Omega(kn/\log n)$ rounds.
\end{theorem}
\begin{proof}
We consider an initial distribution $C$ where each token is at exactly
one node, and no node has more than one token.  Our argument can be
easily extended to the case where nodes may have multiple tokens (we
defer its proof to the full paper).  Let $C^*$ be an initial token
distribution from which any deterministic online algorithm needs
$\Omega(kn/\log n)$ rounds.  The existence of $C^*$ follows from
Theorem~\ref{thm:alg+lower}.  We construct a bipartite graph on two
copies of $V$, $V_1$ and $V_2$. A node $v \in V_1$ is connected to a
node $u \in V_2$ if in $C^*$ $u$ has all the tokens that $v$ has in
$C$.  We will show below that this bipartite graph has a perfect
matching.

Given a perfect matching $M$, we can complete the proof as follows.
For $v\in V_2$, let $M(v)$ denote the node in $V_1$ that got matched
to $v$.  If there is an algorithm $A$ that runs in $T$ rounds from
starting state $C$, then we can construct an algorithm $A^*$ that runs
in the same number of rounds from starting state $C^*$ as
follows. First every node $v$ deletes all its tokens except for those
which $M(v)$ has in $C$. Then algorithm $A^*$ runs exactly as $A$.
Thus, the lower bound of Theorem~\ref{thm:alg+lower}, which applies to
$A^*$, also applies to $A$.

It remains to prove that the above bipartite graph has a perfect
matching.  This follows from an application of Hall's Theorem.
Consider a set of $m$ nodes in $V_2$. We want to show their
neighborhood in the bipartite graph is of size at least $m$. We show
this condition holds by the following 2 cases. If $m < 3n/5$, let
$X_i$ denote the neighborhood size of node $i$. We know $\expect{X_i}
\ge 3n/4$. Then by Chernoff bound
\[\prob{X_i < m} \le \prob{X_i < 3n/5} \le e^{-\frac{\rb{1/5}^2 \expect{X_i}}{2}} = e^{-\frac{3n}{200}}\]
By union bound with probability at least $1-n\cdot e^{-3n/200}$ the
neighborhood size of every node is at least $m$. Therefore, the
condition holds in the first case. If $m \ge 3n/5$, we argue the
neighborhood size of any set of $m$ nodes is $V_1$ with high
probability. Consider a set of $m$ nodes, the probability that a given
token $t$ is missing in all these $m$ nodes is $(1/4)^m$. Thus the
probability that any token is missing in all these nodes is at most
$n(1/4)^m \le n(1/4)^{3n/5}$. There are at most $2^n$ such sets. By
union bound, with probability at least $1-2^n\cdot n(1/4)^{3n/5} =
1-n/2^{n/5}$, the condition holds in the second case.  
\end{proof}
